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Software development 


@ Software represent more than half of the development cost of 
an aircraft 


e Regulated by international standards (DO-178 rev. B/C) 
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Software development 


@ Software represent more than half of the development cost of 
an aircraft 

Regulated by international standards (DO-178 rev. B/C) 

e Tests 


e Expensive because run on a special hardware 
e Can miss bugs 
2 Slow 


Solution : use static analysis 
e@ NASA V&V program 
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Safety properties 


Main objectives : no runtime errors 
buffer overflow 
null dereference 
division by zero 
integer overflow 


Harder objectives : 
assertions (pre/post invariants) 
e termination 


certified => soundness is required 


abstract interpretation is a good candidate 


runtime errors can be security vulnerabilities ! 
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Abstract Interpretation 


@ based on the concrete semantics of your program 
@ automatic formal proof 


@ sound approximation of reachable states 
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Abstract Interpretation 


x(t) 


Possible 
trajectories 


semantics(P) 
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Abstract Interpretation 


me) ge)(e(=)am4e)al=) 


specification( P) 
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Abstract Interpretation 


Possible 
trajectories 


semantics(P) C specification(P) 


Maxime Arthaud 8/83 


Possible 
trajectories 


Test of a few trajectories 


Using testing 
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ct Interpretation 


Abstraction of the trajectories 


abstraction(P) 
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Abstract Interpretation 


Abstraction of the trajectories 


abstraction(P) C specification( P) 
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Abstract Interpretation 


Abstraction of the trajectories 


semantics(P) C abstraction(P) C specification(P) 
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e@ Thank you Pierre Loic Garoche 
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Syllabus 


@ ikos 


@ Project 
@ Toolchain 
@ Demo 

@ Results 
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The IKOS project 


Inference Kernel for Open Static Analyzers 
C++ library for abstract interpretation 
C/C++ static analyzer 

Target embedded systems 


Analyses : 

e Buffer overflow 
Division by zero 
Null dereference 
Uninitialized variables 
Prover 


https://ti.arc.nasa.gov/opensource/ikos/ 
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BKerellenr iin 


Abstract Domains 
C/C++ code * Interval 
* Constants 
* Discrete 
clang IKOS * Congruence 
* Interval + Congruence 
* Octagons 
LLVM IR * Difference Bounds Matrix 
* Pointer Analysis 


ikos-pp ikos-pp 
+ Ikos-pp is an executable that embeds the LLVM opt command. It 
eT applies several LLVM built-in optimizations + our own optimization 
Optimized passes to produce an intermediate optimized LLVM IR. Using the 
LLVM IR optimized LLVM IR, we run LLVM opt command with —arbos option 
to translate the optimized LLVM IR to AR 
* ikos-pp does at least the following optimizations before translating 
LLVM opt command to AR: -mem2reg, -loweratomic, -lowerswitch, and -instnamer 
+ AR pass (-arbos) AR Plugin Analyzers 
* BOA - buffer overflow analysis 
. DBZ - Intra-procedural integer division-by-zero analysis 
AR in s-expr *  UVA- Inter-procedural uninitialized variable + array analysis 
+ NullPtr - Inter-procedural null dereference pointer analysis 


Tool Chain Execution Flow 


ARBOS 


{AR parser, analysis plugin framework} 


* Outputs reports to console 
* IKOSView: desktop GUI that queries results stored in SQLite3 database 
Analysis * Integrated into web services (such as continuous build + bug tracking systems) 


results * SonarQube — using sonar_runner 
* CodeDX -— import results in cppcheck XML format 


* SWAMP - used in cybersecurity 
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LLVM 


@ Low Level Virtual Machine 
@ Compiler Infrastructure 
e@ Generic assembly language 


@ Allow language independent optimization 
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LLVM 


@ Low Level Virtual Machine 
@ Compiler Infrastructure 
e@ Generic assembly language 


@ Allow language independent optimization 


Cc C++] | Fortran Ada 


llvm bitcode 


x86 | | PowerPC ARM AR 
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LLVM 


$ cat test.c 


#include <stdio.h> 


int main(int argc, char** argv) { 


int a[i0]; 

int i; 

for (i = 0; i < 10; i++) { 
afi] =i; 

} 


printf("/d\n", ali - 1]); 
print®(Zd\n", alo); 
return 0; 
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LLVM 


$ clang -c -emit-Ilvm -O1 -o test.bc test.c 
$ opt -S test.bc 


define i32 @main(i32, i8** nocapture readnone) local_unnamed_addr #0 { 
%3 = alloca [10 x i32], align 16 
v4 = bitcast [10 x i32]* %3 to i8* 
call void @llvm.lifetime.start(i64 40, i8* 7%4) #3 
br label %5 


3; <label>:5: 3 preds = %5, %2 
“46 = phi i64[ 0, 42], [ %9, 45 ] 
“7 = getelementptr inbounds [10 x i32], [10 x i32]* %3, i64 0, i64 %6 
78 = trunc i64 %6 to i132 
store i32 48, i32* 47, align 4 
49 = add nuw nsw i64 %6, 1 
%10 = icmp eq i64 %9, 10 
br i1 410, label %11, label 7/5 


3; <label>:11: 3 preds = 45 
%12 = getelementptr inbounds [10 x i32], [10 x i32]* %3, i64 0, i64 9 
%13 = load i32, i32* %12, align 4 
#14 = tail call i32 (i32, i8*, ...) @__printf_chk(i32 1, 
i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), i32 %13) #32 
%15 = getelementptr inbounds [10 x i32], [10 x i32]* %3, i64 0, i164 0 
“416 = load i32, i32* %15, align 16 
“#17 = tail call i32 (i32, i8*, ...) @__printf_chk(i32 1, 
i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), i32 %16) #3 
call void @llvm.lifetime.end(i64 40, i8* nonnull %4) #3 
ret i32 0 
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LLVM 


%2: 

%3 = alloca [10 x i32], align 16 

%A = bitcast [10 x i32]* %3 to i8* 

call void @llvm.lifetime.start(i64 40, i8* %4) #3 
br label %5 


%6 = phi i64 [ 0, %2 J, [%9, %5 J 

%7 = getelementptr inbounds [10 x i32], [10 x i32]* %3, i64 0, i64 %6 
%8 = trunc i64 %6 to i32 

store i32 %8, i32* %7, align 4, !tbaa !3 

%9 = add nuw nsw i64 %6, 1 

%10 = icmp eq i64 %9, 10 

br il %10, label %11, label %5 


7 


%12 = getelementptr inbounds [10 x i32], [10 x i32]* %3, i64 0, i64 9 
%13 = load i32, i32* %12, align 4, !tbaa !3 

%14 = tail call i32 (132, i8*, ...) @__printf_chk(i32 1, i8* getelementptr 
... inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i164 0), i832 %13) #3 

%15 = getelementptr inbounds [10 x i32], [10 x i32]* %3, i64 0, i164 0 
%16 = load i32, i32* %15, align 16, !tbaa !3 

%17 = tail call i32 (132, i8*, ...) @__printf_chk(i32 1, i8* getelementptr 
... inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i164 0), i832 %16) #3 

call void @llvm.lifetime.end(i64 40, i8* nonnull %4) #3 

ret i32 0 


%S: 


%11: 
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IKOS-PP 


e@ IKOS pre-processor 


e Run 


Ilvm optimization passes : 

mem2reg : SSA Form 

globaldce : Dead Code Elimination 

globalopt : Global Variable Optimizer 
simplifycfg : Control Flow Graph Optimizer 
scalarrepl : Scalar Replacement of Aggregates 
sccp : Sparse Conditional Constant Propagation 
loop-simplify : Canonical Form for Loops 
Icssa : Loop Closed SSA Form 

loop-deletion : Dead Loop Elimination 
lowerinvoke : Lower Invoke Instructions 
lowerswitch : Lower Switch Instructions 
home made Ilvm passes : 

Lower Global Variable Initialization 

Lower Constant Expressions 

Lower Select Instructions 

Name Values 
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Abstract Representation 


Major differences with Ilvm : 


e Branching instructions are translated into assertions 
e Memory instructions are byte oriented 
e Some instructions are removed 


Translation from Ilvm to AR using a Ilvm pass 


Text representation using s-expressions 
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($function 
($name ($main)) ($ty (!8)) 
($params ($p ($name ($main.arg_1)) ($ty (!9))) ($p ($name ($main.arg_2)) ($ty (!10)))) 
($local_vars ($local_var ($var ($name ($main._1)) ($ty (!11))))) 
($code 
($entry ($bb_1)) ($exit ($bb_5)) ($unreachable) ($ehresume) 
($basicblocks 
($basicblock ($name ($bb_1)) 
($instructions 
($allocate ($dest ($cst ($localvariableref ($name ($main._1)) ($ty (!11))))) 
($alloca_ty (!12)) ($array_size ($cst ($constantint ($val (j#/1)) ($ty (!9))))) 
($debug ($srcloc ($line (#-1)) ($col (#-1)) ($file (!2))))) 


) 
) 
($basicblock 
($name ($*in_bb_1_to_bb_2_phi)) 
($instructions 
($assign ($lhs ($var ($name ($main.i.0)) ($ty (!9)))) 
($rhs ($cst ($constantint ($val (#0) ) ($ty (19))))) 
($debug (Ssrcloc ($line ({#l6)) (col (fH10)) ($file (113))))) 
) 
) 
asad 
) 
($trans 


($edge ($bb_1) ($*in_bb_1_to_bb_2_phi)) 

($edge ($*in_bb_1_to_bb_2_phi) ($bb_2)) 

($edge ($bb_2) ($*out_bb_2_to_bb_3_icmp_true) ) 
($edge ($bb_2) ($*out_bb_2_to_bb_5_icmp_false) ) 
($edge ($*in_bb_4_to_bb_2_phi) ($bb_2)) 

Fail 

) 
) 
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bb 4: 
main. 14 = add(main.i.0, 1) 


1 = allocate(1, [10 x i32]) 


*in_ bb 4 to_bb 2 phi: 
main.i.0 = main. 14 


*in_bb_1_to_bb 2 phi: 
main.i.0 = 0 


main.i.0 slt 10 
main. 8 = -1 


*out_bb 2 to_bb 3 icmp true: 


*out_bb_ 2 to bb 5 icmp false: 
main.i.0 sge 10 


bb_3: 
main. 10 = sext main.i.0 
__v:7 = mul(4, main._10) 


main. 11 = ptr shift(main. 1,__v:7) 
memory[main. 11j= 


main.i.0 
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main. 8=0 
bb_5: 


main. 17 = sub(main.i i.0, 1) 
main. 18 = sext main. 17 
_V: 10 = mul(4, main. 18) 
main. 19 = ptr_ shift(main. 1, __v:10) 
main. 20 = memory[main. 19] 
main. 21 = ptr_shift(.str, 0) 
main. 22 = call printf(main. 21, main. 20) 
main. 24 = memory[main. 1] 
main. 25 = ptr_shift(.str, 0) 
main. 26 = call printf(main._25, main. 24) 
return 0 


bb_1: 
main._ 
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ARBOS 


e Load an Abstract Representation file (.ar) and apply passes 
@ Similar to Ilvm opt command 

e@ IKOS passes : 

ps-opt : Optimize pointer shift statements 

branching-opt : Optimize the Control Flow Graph 
inline-init-gv : Inline initialization of global variables in main 
unify-exit-nodes : Unify exit nodes 

analyzer : Analyzer pass 
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ARBOS 


bb_4: 
main. 14 = add(main.i.0, 1) 
main.i.0 = main. 14 
ry 
bb_1 
*out_bb 2 to bb 3 icmp true: main. 1 = allocate(1, [10 x i32]) *out_bb 2 to bb 5 icmp false: 
main.i.0 slt 10 main.i.0 = 0 main.i.0 sge 10 
main. 8 =-1 main.i.0 slt 10 main. 8=0 
main. 8 =-1 | 


bb_5: 

main. 17 = sub(main.i.0, 1) 

main. 18 = sext main. 17 

bb 3: __v:10 = mul(4, main. _18) 

main. 10 = sext main.i.0 main. 19 = ptr_shift(main._1, _v:10) 
“_v:7 = mul(4, main. 10) main. 20 = memory[main. 19] 


i a ser oe 7 main. 21 = ptr_shift(.str, 0) 
eae PoP Re heer =¥:7) main. 22 = call printf(main. 21, main. 20) 
Ty! _ _ an main. 24 = memory[main. 1] 
main. 25 = ptr_shift(.str, 0) 
main. 26 = call printf(main. 25, main. 24) 
return 0 
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Liveness analysis 


Pointer analysis 


Memory analysis combining : 
Numerical analysis 

Pointer analysis 

Uninitialized variable analysis 
Null pointer analysis 


Checkers : 


buffer overflow 
e division by zero 
e null dereference 
@ uninitialized variables 
e assertion prover 


e Store results in a SQLite database 
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@ The toolchain is launched via a python script 
@ Generate reports in different formats : 


e Console (gcc style) 
oe JSON 

eo XML 

e etc. 


@ Output database reusable (using ikos-render) 
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BKerellenr iin 


Abstract Domains 
C/C++ code * Interval 
* Constants 
* Discrete 
clang IKOS * Congruence 
* Interval + Congruence 
* Octagons 
LLVM IR * Difference Bounds Matrix 
* Pointer Analysis 


ikos-pp ikos-pp 
+ Ikos-pp is an executable that embeds the LLVM opt command. It 
eT applies several LLVM built-in optimizations + our own optimization 
Optimized passes to produce an intermediate optimized LLVM IR. Using the 
LLVM IR optimized LLVM IR, we run LLVM opt command with —arbos option 
to translate the optimized LLVM IR to AR 
* ikos-pp does at least the following optimizations before translating 
LLVM opt command to AR: -mem2reg, -loweratomic, -lowerswitch, and -instnamer 
+ AR pass (-arbos) AR Plugin Analyzers 
* BOA - buffer overflow analysis 
. DBZ - Intra-procedural integer division-by-zero analysis 
AR in s-expr *  UVA- Inter-procedural uninitialized variable + array analysis 
+ NullPtr - Inter-procedural null dereference pointer analysis 


Tool Chain Execution Flow 


ARBOS 


{AR parser, analysis plugin framework} 


* Outputs reports to console 
* IKOSView: desktop GUI that queries results stored in SQLite3 database 
Analysis * Integrated into web services (such as continuous build + bug tracking systems) 


results * SonarQube — using sonar_runner 
* CodeDX -— import results in cppcheck XML format 


* SWAMP - used in cybersecurity 
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Demo. 
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Aeroquad - The Open Source Quadcopter 


@ Code size : 
@ lines of code : 167k 
e bitcode instructions : 4634 
e@ Time stats : 
e arbos : 1 min 51.888 sec 
e ikos-pp : 0.126 sec 
@ Ilvm-to-ar : 0.898 sec 


@ Summary : 
@ number of checks : 2908 
e number of unreachable checks : 46 (1.6%) 
e number of safe checks : 2688 (92.4%) 
e number of definite unsafe checks : 0 
@ number of warnings : 174 (5.9%) 
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Aeroquad - The Open Source Quadcopter 
e Writes at specific addresses : 
*(0x42) = x; 
@ False positives on loops with casts : 


for (byte axis = 0; axis < 3; axis++) { 
accelSample[axis] = 0; 

- 

@ Tricky array indexing : 

static byte receiverPin[6] = 
(2, 5, 6; 4, 7, 833 

pinData[receiverPin[channel]].edge = 
FALLING_EDGE; 
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Paparazzi - Autopilot System for UAV 


e@ Code size : 

@ lines of code : 23k 

e bitcode instructions : 4436 
e@ Time stats : 


e arbos : 1 min 2.930 sec 
e ikos-pp : 0.132 sec 
@ Ilvm-to-ar : 1.111 sec 


@ Summary : 
@ number of checks : 2372 
@ number of unreachable checks : 352 (14.8%) 
e number of safe checks : 2020 (85.2%) 
e number of definite unsafe checks : 0 
e number of warnings : 0 
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GEN2 


e@ Code size : 
@ lines of code : 13k 
e bitcode instructions : 5340 
e@ Time stats : 
e@ arbos : 2 min 16.161 sec 
e ikos-pp : 0.199 sec 
e Ilvm-to-ar : 1.358 sec 
@ Summary : 


number of checks : 3121 

number of unreachable checks : 0 
number of safe checks : 3028 (97.1%) 
number of definite unsafe checks : 0 
number of warnings : 93 (2.9%) 
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MNAV 


@ Code size : 

@ lines of code : 159k 

e bitcode instructions : 2145 
e@ Time stats : 

e arbos : 12.950 sec 

e ikos-pp : 0.056 sec 

@ Ilvm-to-ar : 0.468 sec 


@ Summary : 
@ number of checks : 430 
@ number of unreachable checks : 17 (3.9%) 
© number of safe checks : 330 (76.7%) 
e number of definite unsafe checks : 0 
e number of warnings : 83 (19.3%) 
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CASS 


@ Time stats : 
e arbos : 1 day 2 hour 17.463 sec 
e ikos-pp : 13.234 sec 
@ Ilvm-to-ar : 24.431 sec 
@ Summary : 
number of checks : 254452 
number of unreachable checks : 33300 (13.0%) 
number of safe checks : 172521 (67.8%) 
number of definite unsafe checks : 0 
number of warnings : 48631 (19.1%) 
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FLTz - flight simulator with OpenGL displays 


e@ Code size : 


@ lines of code : 91k 
e bitcode instructions : 14501 


@ Time stats : 
e arbos : 5 day 9 hour 27 min 41.459 sec 


e ikos-pp : 25.211 sec 

e Ilvm-to-ar : 1 min 2.661 sec 
@ Summary : 
number of checks : 1302470 
number of unreachable checks : 72409 (5.5%) 
number of safe checks : 153312 (11.7%) 
number of definite unsafe checks : 19 (0.001%) 
number of warnings : 1076730 (82.6%) 
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Syllabus 


3) Analyses 


@ Liveness analysis 
@ Pointer analysis 
@ Memory analysis 
@ Property checking 
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EVAUISSeeEINAIS 


Mark live and dead variables after each basic block 


Dataflow analysis 


Used to clean up variables in the abstract domain 


Problem for relationnal domains 
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Liveness analysis - Algorithm 


@ Kill - Gen algorithm 
e@ GEN[b] : set of variables used in b before any assignment 
e KILL[b] : set of variables that are assigned in b 
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Liveness analysis - Algorithm 


Kill - Gen algorithm 


GEN|[b] : set of variables used in b before any assignment 
KILL[b] : set of variables that are assigned in b 


GEN|stmt : y © f(x1,--- , Xn)] = {x1,-.-, Xn} 
KiLL|[stmt : y <— f(x1,--- ,Xn)] = {y} 
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Liveness analysis - Algorithm 


@ Kill - Gen algorithm 
e@ GEN[b] : set of variables used in b before any assignment 
@ KILL[b] : set of variables that are assigned in b 


@ GEN[stmt : y < f(x1,--- , Xn)] = {x1,-.-, Xn} 
e KilL[stmt : y < f(x1,--- ,Xn)] = {y} 
@ LIVE;n[b] = GEN[b] U (LIVE out[b] — KILL[5]) 


@ LIVEout [6] = Upesuceio} L!VEin[P] 
@ LIVE Sy[final] = 0 
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Liveness analysis - Example 


bb 4: 
main. 14 = add(main.i.0, 1) 
main.i.0 = main. 14 


ry 
bb_1: 
*out_bb 2 to bb 3 icmp true: main._1 = allocate(1, [10 x i32]) *out_bb 2 to bb 5 icmp false: 
main.i.0 slt 10 main.i.0 = 0 main.i.0 sge 10 
main. 8 = -1 main.i.0 slt 10 main. 8 =0 
main. 8 = -1 


bb_3: 


main. 10 = sext main.i.0 bb 5: 
__v:7 = mul(4, main._10) —"" 
main. 11 = ptr_shift(main._1, _ v:7) 
memory[main. 11] = main.i.0 


return 0 
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MOR ar TEINS 


Pointer analysis : What memory locations can a pointer 
expression refer to? 


Alias analysis : Are two pointers refering to the same 
locations ? 


Intraprocedural vs Interprocedural 
@ Flow sensitive vs Flow insensitive 


e@ Context sensitive vs Context insensitive 
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Pointer analysis - Model 


How to model memory locations ? 
Global variables : use symbolic names (e.g, g) 


Local variables : use symbolic names (e.g, main.x) 


Dynamically allocated memory : use symbolic names ? 


e Problem : potentially unbounded locations (think about a loop) 
e Solution : use symbolic names with an instruction counter (e.g, 
blk(1, X)) 
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Pointer analysis - Andersen's algorithm 


e@ Andersen's pointer analysis 


@ For each pointer p, we call 7, the set of memory locations 
pointed by p 


@ Goal : find T, for each pointer p 


Idea : view pointer assignments as subset constraints 


Complexity : O(n?), worst case O(n‘) 


Maxime Arthaud 44/83 


Pointer analysis - Andersen's algorithm 


Andersen's pointer analysis 


For each pointer p, we call T, the set of memory locations 
pointed by p 


Goal : find T, for each pointer p 


Idea : view pointer assignments as subset constraints 


Complexity : O(n?), worst case O(n‘) 


p=&xs T, 2 {x} 
p=qt+oeT,2 Ty 
p=*q = Tp > t1lg S Vx € Tg; Tp D> O(x) 
xp =qG&*TpD Tg & Vx € Tp, O(x) D Ta, 
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Pointer analysis - Andersen's algorithm 


Andersen's pointer analysis 


@ For each pointer p, we call T, the set of memory locations 
pointed by p 


@ Goal : find T, for each pointer p 
@ Idea : view pointer assignments as subset constraints 


@ Complexity : O(n?), worst case O(n*) 


ep=&x ST, 2 {x} 
p=qt+oeT,2 Ty 
p=*q = Tp > tle S Vx € Tg; Tp D> O(%) 
xp = Gq *TpD Tg & Vx € Tp, O(x) D Ta, 


@ How to solve the constraints system ? A fix point, of course ! 
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Pointer analysis - Andersen's algorithm 


Example : 
eo p= ka 
eg=k&b 
°*p=q 
er=k&c 
e@s=p 
*t]4p 
e@xs=r 


Maxime Arthaud 45 / 83 


Pointer analysis - Andersen's algorithm 


Example : 
0 p= kas T, 2 {a} 
eg=&bs Ty 2 {b} 
@*p=qe*T,p2 Ty 
efr=j=les T, 2 ic} 
es=peTl, 2 T, 
et=*pS 1T,2+*T>, 

exts=rex*xlsDT, 


Exercice : solve it! 
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Pointer analysis - Andersen's algorithm 


Solution : 

° Tp = {a} 

o T, = {b} 

6 1,= te} 

e J: =e} 
eT, ={b,c} 

@ O(a) = {b,c} 
e O(b) =0 

e O(c)=90 


Pointer analysis - Steensgaard’s algorithm 


e@ Steensgaard’s pointer analysis 


@ Idea : view pointer assignments as equality constraints 
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Pointer analysis - Steensgaard’s algorithm 


e@ Steensgaard’s pointer analysis 

@ Idea : view pointer assignments as equality constraints 
e@p=&x ST, 2 {x} 

e@p=qtoeTl,p=Ty 

@ p=*q Tp =*Tg & Vx € Tg, Tp = O(x) 
exp=qe*Tp= Tg S Vx e€ Tp, O(x) = Tg 
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Pointer analysis - Steensgaard’s algorithm 


e@ Steensgaard’s pointer analysis 

@ Idea : view pointer assignments as equality constraints 
e@p=&x ST, 2 {x} 

e@p=qtoel,p=Ty 

@ p=*q Tp =*Tg & Vx € Tg, Tp = O(x) 
exp=qe*Tp= Tg Vx€ Tp, O(x) = Tg 


@ Question : Is it more or less precise ? Why ? 


@ Question : Complexity ? 
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Pointer analysis - Steensgaard’s algorithm 


@ Steensgaard is less precise than Andersen's algorithm 


@ Each equality constraint is equivalent to 2 inclusion constraints 


Steensgaard’s constraints system include Andersen's 
constraints 

Think fix point : once you reached Andersen's system fix point 
solution, you will keep growing to satisfy equality constraints 


Complexity : O(nlog(n)) (process each constraint once using 
union-find) 
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Pointer analysis - Steensgaard’s algorithm 


Solution : 
*i,=>ie=— la} 
© 7g= 77 = 7T-= Oe) ={6c} 
e O(b) =0 
e O(c)=0 
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Pointer analysis - IKOS 


e@ IKOS uses Andersen's approach 


e@ Based on Arnaud Venet's paper : « A Scalable Nonuniform 
Pointer Analysis for Embedded Programs », SAS 2004 


e@ Compute points-to set (Andersen) and offset (Intervals) for 
each pointer 


y# =P > (AU{T}) x1 


Interprocedural 


Flow insensitive 


Context insensitive 
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WKTRAO AA LEINATS 


Memory analysis (also called Value analysis) based on a 
reduced domain product of : 

Numerical domain for integers (by default, intervals) 
Pointer domain 

Null pointer domain 

Uninitialized variable domain 

Floating points are currently ignored 


e@ Based on Antoine Mine's paper : « Field-Sensitive Value 
Analysis of Embedded C Programs with Union Types and 
Pointer Arithmetics », LCTES’06 


@ Interprocedural 


Context sensitive 
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Memory analysis - Pointer domain 


Pointer abstract domain 

e Dt =V— (AU{T}) xI 

@ Pointwise order ce, Pointwise union uF 
@ ( # CF, uF) is a lattice 


Galois connection (ap, yp) with the concrete semantics 


Reduction with the previous flow-insensitive pointer analysis 
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Memory analysis - Pointer domain 


e Abstract operations : 
@ [p = &x]*(p) = p [p > ({xF, [0, 0])] 
e [p=q+0]*(p) =p [p — (addresses(p(q)), offsets(p(q)) + o] 
e [xp = q]*(o) =p 
e [ep = *q]*(e) = p [Pp — (7, ]-co, +00])] 


Memory analysis - Pointer domain 


e Abstract operations : 
e [p = &x]*(p) = p [p > ({xF, [0,0])] 
e [p=q+0]*(p) =p [p — (addresses(p(q)), offsets(p(q)) + o] 
e [*p = q]*() =p 
e [p= *q]* (0) =p [p > (T, ]-00, +00f)] 


@ Question : [p == q]*(p) 
@ Question : [p 4 q]*(p) = 


? 


Memory analysis - Null pointer domain 


Null pointer abstract domain 

Dy = {1, Null, NonNull, T} 

e yi =V—-D, 

o | CH Null, | Ci NonNull, Null Gif T, NonNull Gif T 
@ Null U# NonNull = T 


 ( #* # | I) is a lattice 


@ Galois connection (Qn, 7n) with the concrete semantics 
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Memory analysis - Uninitialized variable domain 


Uninitialized variable abstract domain 

Dy, = {4, Init, Uninit, T} 

° yi =V-D, 

o | Ci Init, L Ci Uninit, Init Ci T, Uninit C# T 
e Init Li Uninit = T 

(Di, C#, Ui) is a lattice 


Galois connection (a,,, 7) with the concrete semantics 
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Memory analysis - Memory model 


@ Question : how to model the memory ? 
@ LLVM is low level, a byte representation is necessary 


@ The C language is not type safe and is very permissive on casts 
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Memory analysis - Memory model 


@ Question : how to model the memory ? 
@ LLVM is low level, a byte representation is necessary 


@ The C language is not type safe and is very permissive on casts 


We need to model correctly the following code : 


uint64_t x = 1; 

uint32_t* p = (uint32_t*)&x; 
p+ 1; 

uint32_t y = *p; 


By the way, what is y’s value ? 
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Memory analysis - Memory model 


Memory model from « Formalizing the LLVM Intermediate 
Representation for Verified Program Transformations », POPL 2012 


Memory cell mc = | mb(size, byte) 
| mptr(b/k, offset, index) 
| muinit 

@ Memory state = (N, B, C) 


@ N: next block id 
e B=Z*t +Z- : block id to block size (bytes) 


e C=Z*t x Zt + MC: (block id, offset in bytes) to memory 
cell 
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Memory analysis - Memory model 


Example : 


int* p = (int*) malloc(sizeof (int) + sizeof (int*)); 
*p = 0x01020304; 

int** q = (int**)(p + 1); 

*q= pt 2; 
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Memory analysis - Memory model 


Example : 


int* p = (int*) malloc(sizeof (int) + sizeof (int*)); 
*p = 0x01020304; 
int** q = (int**)(p + 1); 


*q= p+ 2; 

blk id | offset | memory cell 
0 0 mb(32, 4) 
0 1 | mb(32, 3) 
0 2 | mb(32, 2) 
0 3 | mb(32, 1) 
0 4 mptr(I, 8, 0) 
0 5 mptr(I, 8, 1) 
0 6 mptr(I, 8, 2) 
0 7 mptr(I, 8, 3) 


By the way, what architecture could it be? 


Memory analysis - Memory abstract domain 


Memory abstract domain 


Based on Antoine Mine's paper : « Field-Sensitive Value 
Analysis of Embedded C Programs with Union Types and 
Pointer Arithmetics », LCTES’06 


Idea : abstract memory using cells : C(address, offset, size) 


Each cell is considered as a variable in the underlying abstract 
domain 


Cells may overlap 


mem — Cx underlying 
# _— Wt # # # 
In IKOS, underlying — “~~ num x ptr x null x unini 


Pointwise partial order, Pointwise union 
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Memory analysis - Memory abstract domain 


@ Abstract operations : forward to Viinderlyingy XCept memory 
read and write. 


e@ Memory write : 

set to | if p is null or uninitialized 

(points _ to, offset) = p(p) 

cells = realize_ write(points_ to, offset) 

Yc € cells, strong _update(c, rhs) or weak _update(c, rhs) 


@ Memory read : 

set to | if p is null or uninitialized 

(points _ to, offset) = p(p) 

cells = realize_read(points _ to, offset) 

Ye € cells, strong _update(Ihs,c) or weak _update(ths, c) 
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Memory analysis - Memory abstract domain 


Example : 


int* p = (int*) malloc(sizeof(int) + sizeof (int*)); 
*p = 0x01020304; 

int** q = (int**)(p + 1); 

*q = pt 2; 
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Memory analysis - Memory abstract domain 


Example : 
int* p = (int*) malloc(sizeof(int) + sizeof (int*)); 
*p = 0x01020304; 
int** q = (int**)(p + 1); 
*q = pt 2: 


Abstract value at the end : 


(malloc — {{0,4}, {4, 4}}) 
(C(malloc, 0,4) —> [0x01020304, 0x01020304]) 
(C(malloc, 4,4) — (malloc, [8,8]), 
p — (malloc, [0,0]), 
q — (malloc, [4, 4])) 
(C(malloc, 4,4) — NonNull, p — NonNull, q + NonNull) 
(C(malloc,0,4) > Init, C(malloc, 4,4) — Init, p — Init, q — Init)) 
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Memory analysis - Memory abstract domain 


static union { 


struct { uint8 al, ah, bl, bh, ... } b; 
struct { uint16 ax, bx, ... } w; 
} regs; 


regs.w.ax = X; // (1) 
if (!regs.b.ah) { // (2) 
regs.b.bl = regs.b.al; // (3) 
} else { // (4) 
regs.b.bh = regs.b.al; // (5) 
ty 
// (6) 
regs.b.al = X; // (7) 
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mory analysis - Memory abstract domain 


(5) 


(6) 


(7) 


0 1 2 3 
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Property checking 


e Last step : check for properties at each statement location 
@ Checkers : 
e buffer overflow : 0 <= offset and 
offset + read_ size <= buffer_ size 
division by zero : divisor # 0 
null dereference : p 4 Null 
uninitialized variable : v 4 Uninit 
prover: v #0 
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SSVIEL eS 


4 ) Miscellaneous 


@ Abstract domains implementation 
@ Analyzing C++ 

@ Exception handling 

@ Relational abstract domains 

@ Function summarization 

@ Integer overflow 
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Abstract domains implementation 


@ Separate domain (V — D) are implemented with patricia trees 
@ Insertion and removal in O(/og(n)) 

@ Merge in O(n) 

@ Transformation in O(n) 

@ Very cheap union! 


Kh Q 
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Analyzing C++ 


Analyzing C++ is very tricky : 
@ Heavy chains of function calls because of templates 
@ The libc++ needs to be modeled 
@ Need to be precise on pointers for virtual method calls 


Handle exceptions 
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Analyzing C++ 


Analyzing C++ is very tricky : 
@ Heavy chains of function calls because of templates 
@ The libc++ needs to be modeled 
@ Need to be precise on pointers for virtual method calls 
@ Handle exceptions 


Work in progress | 
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Exception handling 


bb_1: 
memory[x] = 9 
_Z1fv._2 = call_Z14__ikos_unknownv() 


*out_bb 1 to bb 2 icmp true: *out_bb 1 to bb 3 icmp false: 
_Z1fv._2 ne 0 _Z1fv._2 eq 0 
_Z1fv._3 = -1 _Z1fv._3 = 0 
bb 2: 
_Zifv..5 = call __cxa_allocate_exception(8) 
_Z1fv._6 = bitcast Z1fv._5 bb_3: 
memory[_Z1fv._6] = $null memory[x] = 0 
_Z1fv._8 = bitcast ZTIDn return 
__v:6 = call __cxa_throw(_Z1fv._5, Z1fv._8, $null) 
unreachable 


Ss 


_unified_exit: 
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Exception handli 


y 
bb 8: 
main. 17 = memoryly. 


* bb 8 split_icmp false: * bb 8 split icmp true: 
main. 17 ne 0 main. 17 eq 0 
main. 18 = 0 main. 18 =-1 


*out_bb_8 merge icmp: 


__v:17 = invoke _Z13__ikos_asserth(main._18) 


y 
bb_10: 
landingpad(main. 21) 

v:18 =0 


__v:18 = add(__v:18, 0) 
main. 22 = extract_elem(main. 21, _v:18) 
__v:19 =0 

v:19 = add(_v:19, 8) 


main. 23 = extract_elem(main._21, _v:19) 


: 


bb _11: 
main. 25 = bitcast ZTIPv bb 9: 


main. 26 = call llvm.eh.typeid.for(main._25) 
T 
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Exception handling 


exc = D# x D# 

throw(e)]*(N, E) =(L,NUE) 
landingpad(e)]*(N, E) = (E, 1) 

v = xJ#(N, E) = (Iv = x] #(N), €) 

(Nz, Ex) LI (No, Er) = (Nz U No, Ey U E2) 
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ACcle lulose mel esd e-lomelolanr lias 


@ Intervals are very imprecise for loops with a non-deterministic 
bound 


@ Solution : use a weakly-relational domain, such as the DBM 
domain 


e@ Based on Antoine Mine's paper : «A New Numerical Abstract 
Domain Based on Difference-Bound Matrices », PADO, 
155-172, 2001. 
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Difference-Bound Matrices 


e Difference-Bound Matrices 
@ Weakly-relational abstract domain 


e mj € ZU {+oo} 


0 moi M02 .«-- at ev-vV < fy; 
m 0 m m 
1,0 1,2 1,n e Vo = 0, thus Vi Ss [—mio, Mo, i] 
mo m1, 0 M2,n 
e Abstract operations require 
normalization 
MnO Mn =™Mn,2 0 


@ normalization : 
Vi — Vk < Mk, j and 
Vk — Vj SM => 
Vi — Vj SM, + Mj,k 
@ cost O(n*), n number of 
variables 
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Variable packing 


@ Idea : keep a list of DBMs, where each DBM contains 
variables that are related to each other. 

e@ Union-Find structure to dynamically infer relations among 
variables 

@ Normalization cost O(n), n number of DBMs 


DBM 1 DBM 2 DBM 3 
{x, y, u} {z} {v, w} 


© 


®) 
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Variable packing 


@ Idea : keep a list of DBMs, where each DBM contains 
variables that are related to each other. 

e@ Union-Find structure to dynamically infer relations among 
variables 

@ Normalization cost O(n), n number of DBMs 


DBM 4 DBM 2 
{x, y, u, v, w} {z} 


2 @ 
Y OW 
©) 
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Variable packing - Tests 


Pointer analysis using function summarization. 


File DBMs _| Size | Var Packing | Size 
astree-ex 1.01s 36 0.13s 7 
test-1 0.13s 27 0.03s 4 
test-1-unsafe 0.13s 27 0.02s 4 
test-10 0.03s 10 0.02s 4 
test-10-unsafe 0.03s 11 0.02s 4 
paparazzi-microjet | 3241.14s | 611 158.50s 88 
gen2 > 5h ? 7817.42s 367 
aeroquad-servo 78.12s 71 1.33s 14 
aeroquad-new 86.18s 65 0.76s 5 
cornell 447.06s | 226 2.64s 6 
sporesate2-spore-pl | 895.45s ? 10.29s 19 
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e@ Group variables depending on heuristics 


e@ Use the gauge domain 
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e@ Group variables depending on heuristics 


e@ Use the gauge domain 


Work in progress | 
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ml arora (ola msielpalantlareialeyal 


e@ IKOS uses dynamic inlining 


@ Idea : analyse each function only once to build a summary 


(rf) 
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Function summarization : Call graph analysis 


@ Problem : call graph cycle 


Cm pf) 
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Function summarization : Call graph analysis 


@ Problem : call graph cycle 


e@ Strongly connected component analysis 
@ Topological order 
° 


Bottom-up analysis (from the leaves to the root) 


Top-down analysis (from the root to the leaves) 
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Function summarization : Memory analysis 


e@ Need a way to express the effect of a function call on the 
memory 


@ More particularly on global variables and pointer parameters 


Relation between the input memory state and the output 
memory state 


@ Idea : Introduce input cells and output cells 


x=x+1< Cell{x,0,4, Out} = Cell{x,0,4, In} +1 
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Function summarization : Tests 


Buffer overflow analysis using function summarization 


File Inlining | Summaries | Warnings | Errors Lines 
astree-ex 0.36s 0.57s 2/2 0/0 22 (1) 
test-1 0.14s 0.16s 0/0 0/0 22 (1) 
test-l-unsafe | 0.13s 0.18s 0/0 2/2 22 (1) 
test-10 0.10s 0.13s 0/2 0/0 20 (3) 
paparazzi 154.03s | 110.09s 0/0 0/0 | 24650 (199) 
gen2 307.66s > 3h 195/? 0/? | 22030 (82) 
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Integer overflow 


@ Problem : Ilvm integer types are signedness agnostic 


e@ Because most instructions are signedness agnostic : add, sub, 
mul, etc. 


@ How to be be sound and precise ? 


Intervals with infinite precision : imprecise or unsound 
Suppose integers are unsigned : imprecise 

Suppose integers are signed : imprecise 

Wrapped intervals : Jorge Navas’s paper « 
Signedness-Agnostic Program Analysis : Precise Integer 
Bounds for Low-Level Code » 

e Domain product : unsigned and signed 
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Syllabus 


(5 Conclusion 
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(GeonnrellUsiteln 


Thank you. Questions ? 
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